A Study and Comparative Analysis of Different Stemmer and Character Recognition Algorithms for Indian Gujarati Script
نویسندگان
چکیده
A lot of work has been reported on optical character recognition for various non-Indian scripts like Chinese, English and Japanese and Indian scripts like Tamil, Hindi Telugu, etc. , in this paper, we present a literature review on stemmer, optical character recognition (OCR) and Text mining work on Indian scripts, mainly on the Gujarati languages. We have discussed the different techniques for OCR and text mining in Gujarati scripts, and summarized most of the published work on this topic and gives future directions of research in the field of Indian script.
منابع مشابه
Script Identification from Bilingual Gujarati-English Documents
In a multi-lingual country like India, in most of the official papers, school text books, magazines, it is observed that English words intersperse within the Indian regional languages. So a bilingual Optical Character Recognition (OCR) system is needed which can recognize these bilingual documents and store it for future use. In this paper authors present an OCR system developed for the script ...
متن کاملGujarati Character Identification: A Survey
English Character Recognition techniques have been studied extensively in the last two decades and it gain unbelievable high progress and success ratio. But for regional languages these are still emerging and their success ratio is very poor. In Gujarat, there are thousands of people who can speak, write and understand only Gujarati language. Rapid growing computation may increase Indian CR met...
متن کاملAnalysis of structural features and classification of Gujarati consonants for offline character recognition
Wide range of applications and numerous other complexities involved in character recognition (CR) makes it a continuous and open area of research. Feature selection and classification plays major role in achieving higher accuracy for character recognition. In the era of digitization its compelling need to have CR system for regional script. This paper presents analysis of structural features an...
متن کاملExtraction of Characters and Modifiers from Handwritten Gujarati Words
The research activity related to Optical Character Recognition (OCR) for almost all Indian languages is very less. Gujarati script is one of the scripts for which very less literature is available, as far as OCR activities are concerned. This paper describes one of the important phase of OCR, segmentation of handwritten words into its basic components namely basic characters, conjunct character...
متن کاملRotation Estimation Of Gujarati Script Document Using Hough Transform
This paper includes a proposed technique for the Estimation of Skew present in the image of Gujarati Script Document using the Hough Transform technique. It includes simple pre-processing tasks like the Dilation, Erosion, and Thinning. Once these processes are applied the Final image is gone through Hough Transform and a quietly close angle is achieved. It provides promising results when applie...
متن کامل